Search CORE

77 research outputs found

Query by String word spotting based on character bi-gram indexing

Author: Ghosh Suman K.
Valveny Ernest
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/05/2015
Field of study

In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasetsComment: To be published in ICDAR201

arXiv.org e-Print Archive

Crossref

Hierarchical multimodal transformers for Multi-Page DocVQA

Author: Karatzas Dimosthenis
Tito Rubèn
Valveny Ernest
Publication venue
Publication date: 07/12/2022
Field of study

Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure

arXiv.org e-Print Archive

Report on the Second Symbol Recognition Contest

Author: Dosch Philippe
Valveny Ernest
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 25/08/2005
Field of study

http://www.springer.com/lncsFollowing the experience of the first edition of the international symbol recognition contest held during GREC'03 in Barcelona, a second edition has been organized during GREC'05. In this paper, first, we bring to mind the general principles of both contests before presenting more specifically the details of this last edition. In particular, we describe the dataset used in the contest, the methods that took part in it, and the analysis of the results obtained by the participants. We conclude with a synthesis of the contributions and lacks of these two editions, and some leads for the organization of a forthcoming contest

INRIA a CCSD electronic archive server

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

Author: Dey Sounak
Dutta Anjan
Ghosh Suman K.
Lladós Josep
Pal Umapada
Valveny Ernest
Publication venue
Publication date: 28/04/2018
Field of study

In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.Comment: Accepted at ICPR 201

arXiv.org e-Print Archive

Crossref

Open Research Exeter

Handwritten Word Spotting with Corrected Attributes

Author: Almazan Jon
Fornés Alicia
Gordo Albert
Valveny Ernest
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

International audienceWe propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query-by-example, where the query is an image, and query-by-string, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Comparing Graph Similarity for Graphical Recognition

Author: Jouili Salim
Tabbone Salvatore
Valveny Ernest
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The original publication is available at www.springerlink.com. 8th International Workshop, GREC 2009, La Rochelle, France, July 22-23, 2009. Selected PapersIn this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used including line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each graph distance measure depends on the kind of data and the graph representation technique

INRIA a CCSD electronic archive server